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ABSTRACT 

This  paper  reviews  several  different  methods  for  identifying  the  orders 
of  autoregressive-moving  average  models  for  time  series  data.  The  case  is 
made  that  these  have  a  common  basis,  and  that  a  unified  approach  may  be  found 
in  the  analysis  of  a  matrix  G,  defined  to  be  the  covariance  matrix  of 
forecast  values. 

The  estimation  of  this  matrix  is  considered,  emphasis  being  placed  on  the 
use  of  high  order  autoregression  to  approximate  the  predictor  coefficients. 
Statistical  procedures  are  proposed  for  analysing  G,  and  identifying  the 
model  orders. 

A  simulation  example  and  three  sets  of  real  data  are  used  to  illustrate 
the  procedure,  which  appears  to  be  very  useful  as  a  tool  for  order 
identification  and  preliminary  model  estimation. 
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SIGNIFICANCE  AND  EXPLANATION 


« 


The  prediction  of  a  sequence  of  autocorrelated  observations  is  generally 
facilitated  by  the  use  of  autoregressive-moving  average  (ARMA)  models*  These 
represent  the  observations  in  terms  of  simple  recurrence  relations.  The 
numbers  of  terms  in  these  equations  (the  orders)  and  the  coefficients  (or 
parameters)  need  to  be  estimated  from  the  observations.  This  is  a  complex 
problem  which  generally  requires  nonlinear  optimisation  for  a  range  of 
possible  orders. 

Considerable  effort  has  been  devoted  in  recent  years,  to  methods  for 
obtaining  rapid  and  moderately  efficient,  though  necessarily  approximate, 
solutions  to  this  problem.  The  paper  reviews  the  work  in  this  area,  and 
proposes  a  unified  procedure  which  relies  on  detecting  the  singularity  of  a 
certain  matrix.  This  is  the  covariance  matrix  of  the  predictions,  and  it  may 
be  estimated  by  fitting  a  (linear)  high  order  autoregressive  model  to  the 
data. 

Examples  indicate  that  this  approach  is  remarkably  successful  in 
providing  good  preliminary  estimates  of  the  ARMA  model  orders  and  parameters. 


summary  lies  with  MRC,  and  not  with  the  authors  of  this  report. 


A  UNIFIED  APPROACH  TO  ARMA  MODEL  IDENTIFICATION 
AND  PRELIMINARY  ESTIMATION 


G.  Tunnicliffe  Wilson  and  D.  Piccolo* 

1.  INTRODUCTION 

Since  the  work  by  Box  and  Jenkins  (1970),  henceforth  referred  to  as  BJ,  there  has  been 
a  renewed  and  continuing  interest  in  improved  methods  of  characterising  the  orders,  and 
obtaining  good  preliminary  estimates  for  the  parameters  of  ARMA  models.  Following  the 
notation  of  BJ  let  us  therefore  consider  a  zero-mean  stationary  times  series  xfc,  which  is 


believed  to  follow  an  ARMA  (p,q)  model: 

x_  «  4,x,_  .  +  ...  t  ♦  +  a,.  -  8,a^  i. 

t  1  t-1  p  t-p  t  1  t-1  q  t-q 

In  theory,  the  orders  p,q  and  parameters  4,6  can  be  determined  by  maximum 


(1.1) 


likelihood  estimation  of  the  parameters  for  increasing  values  of  the  orders.  This  is  time 
consuming,  because  nonlinear  estimation  methods  are  necessary  whenever  q  >  0.  In  this 
case,  quick  techniques  for  identifying  model  orders  and  supplying  preliminary  parameter 
estimates  are  valuable  even  if  they  are  not  fully  efficient,  and  on  occasions  might  be 
inadequate.  These  identification  methods  are  usually  based  on  relatively  rapid  and  direct 
computations  deriving  from  sample  autocorrelations  and  linear  regression.  The  aim  of  this 
paper  is  to  put  the  case  that  many  of  the  procedures  which  have  been  advocated  since  the 
appearance  of  BJ,  and  also  many  of  the  earlier  methods  -  particularly  those  due  to  Durbin 
(1959,  I960)  -  may  be  viewed  as  having  essentially  the  same  basis.  This  leads  to  a 
practical  procedure  which  is  very  similar  in  its  motivation  to  the  implementation  of  the 
work  of  Durbin  (1960)  as  presented  recently  by  Hannan  and  Rissanen  (19B2)  -  henceforth 
referred  to  as  HR.  We  believe  however  that  is  has  some  advantages,  and  may  appeal  to  the 
practitioner  of  time  aeries  analysis .  We  end  the  paper  with  a  discussion  of  how  a  simple 
regression  may  furnish  estimates  of  ARMA  model  parameters  which  are  very  close  to  the 
maximum  likelihood  estimates. 
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2.  PROPERTIES  OF  THE  MODEL 


Reader*  of  BJ  will  be  familiar  with  three  very  similar  equations  which  follow  directly 
from  the  model  (1.1),  each  of  which  also  characterises  the  ARMA  model.  The  first  of  these 
is  (BJ  p.  75) 

Pj  -  +  •••  +  ♦pPj-p  for  i  >  9  (2.1) 

where  is  the  autocorrelation  function  (acf)  of  xfc. 

The  pattern  of  the  acf  for  j  >  q  -  p  is  a  mixture  of  damped  exponential  and 
sinusoidal  terms.  Inspection  of  the  sample  acf  with  a  view  to  recognising  such  a  pattern 
is  an  important  part  of  ARMA  modelling.  Increasingly,  there  is  a  desire  for  automatic 
recognition  of  this  pattern  by  detecting  for  which  values  of  p  and  q  the  linear 
relationship  (2.1)  holds. 

The  second  equation  is  (BJ  p.  139) 

A  A  A 

x  (j)  -  ♦  .x  (j  -  1)  +  •••  +  ♦  x  (j  -  p)  for  j  >  q  (2.2) 

n  1  n  p  n 

where  x^( j )  is,  for  fixed  forecast  origin  n  and  increasing  lead  time  j,  the  forecast 
function  of  the  series,  i.e. 

A 

xn(j)  -  E(xn+j|P)  (2.3) 

where  P  is  a  set  of  past  variables.  Although  the  past  is  usually  taken  to  be  the  whole 

of  the  set  xn',tn_i*xn_2  •••  th®  equation  still  holds  if  P  is  any  subset  of  these.  In  e 

sense,  (2.2)  is  more  fundamental  than  (2.1)  because  by  taking  P  -  {x  },  we  get 

n 

x  ())  -  p,x  by  simple  regression,  and  substituting  this  into  (2.2)  leads  to  (2.1). 
n  j  n 

In  recent  years,  following  very  such  on  the  work  of  Akaike  (1974),  th*  structure  of 

the  predictor  space  of  random  variables  defined  by  the  forecast  function,  has  been  seen  to 

have  an  important  role  in  characterising  the  ARMA  model.  See  also  Massali  (1902)  for  work 

which  is  closely  related  to  our  own  use  of  this  concept.  For  an  ARMA  (p,q)  model,  (2.2) 

implies  that  this  space  has  a  finite  dimension  even  if  the  past  P  is  infinite,  because 

for  J  >  q,  x  (j)  is  a  linear  function  of  previous  values.  Define  G  to  be  th* 
n 

prediction  covariance  matrix  of  x„( j )  for  j  ■  1  ...  K,  where  K  is  some  relatively 
large  (and  for  the  time  being  arbitrary)  upper  limit  on  the  forecast  function  length.  W* 
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where  now  P  is  the  infinite  past  (M  *  •).  Define  further  the  partitioned  matrix 


and  vector 


Then  ■  t'a 


and 


**  *  “  (*n+k 


“n+1  1  an  *n-1  * 


and  x^  ■  Tp*p,  so  that 


-  fro2  -  (f •?  +  f r?  >o2 
rr  a  FT  P  P  a 


G 


E{xr^,) 


VILcf* 
P  P  a 


rr 


2 

T'T  a 
r  r  a 


(2.9) 


(2.10) 


(2.11) 

(2.12) 


Finally,  defining  the  vector  of  length  max(p,q)  +  1, 


(2.13) 


we  note  that  the  three  equations  (2.1),  (2.2)  and  (2.8)  to  which  we  have  attached 
Importance,  may  be  presented  for  K  -  max(p,q)  +  1  as 
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(2.14) 


E(xrxp*  -  G4  -  0  (2.15) 

tp*  -  0  .  (2.16) 

estimation  of  the  matrices  in  these  equations,  and  detection  of  the  order  p  at  which 

the  implied  rank  deficiencies  occur  in  (2. 14)-(2. 16),  serves  both  to  identify  max(p,q) 

and  to  provide  preliminary  estimates  of  4 •  We  now  examine  how  previous  methods  relate  to 

this.  For  present  convenience  we  shall  use  p  in  the  place  of  max(p,q),  and  extend  one 

or  the  other  of  4.,  8.  up  to  order  p  with  zeros.  We  assume  that  one  of  4  or  9  is 

j  j  P  P 

non-zero  and  that  4(B)  and  8(B)  have  no  common  factors.  We  shall  refer  to  this  as  the 
assumption  that  p  »  q. 

3.  ARRAY  METHODS 

Biquin,  Gourieroux  and  Monfort  (1980)  proposed  a  corner  method  in  which  an  array  of 
determinants  is  scanned  for  a  set  of  zeros.  This  is  closely  related  to  the  preliminary 
estimation  procedure  qiven  in  the  appendix  p.  499  of  BJ,  where  a  shifted  set  of  Yule-Walker 
equations  is  solved  for  4,  •••  4p.  The  matrix  in  these  equations  is  that  used  by  Beguin 
et  al.,  and  when  both  p  and  q  exceed  their  true  values  the  matrix  is  singular, and  the 
determinant  zero.  In  the  case  p  «  q  this  matrix  is  simply  with  M  »  K  •  p.  Gray, 

Kelley  and  Mclntlre  (1978)  also  use  an  array,  and  a  statistic  which  is  closely  related  to 
the  above  determinant,  being  in  fact  a  ratio  of  two  determinants.  They  look  for  a  change 
from  a  relatively  constant  pattern  of  statistics,  to  an  undefined  or  erratic  set  when  p 
and  q  become  too  large.  Woodward  and  Grey  (1981)  relate  this  statistic  to  the  methods  of 
BJ. 

Again,  Glasbey  (1982)  proposes  a  related  test  statistic  which  similarly  becomes 
undefined  when  p  and  q  become  too  large. 

The  problems  with  these  methods  derives  from  the  fact  that  when  sample  correlations 
are  used  to  estimate  the  matrix  TpF>  it  is  unlikely  to  he  exactly  singular  and  it  is 
difficult  to  derive  statistical  tests  for  this  hypothesis.  We  suggest  that  it  should  be 
more  efficient  to  look  for  rank  deficiency  in  TpF  when  K  “  p  ”  1  and  M  is  large. 
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This  requires  combining  the  information  in  the  rows  of  T, 


and  this  is  precisely  what  is 


achieved  in  the  matrix  G. 


PF' 


4.  CANONICAL  CORRELATION  ANALYSIS 

Investigation  of  the  prediction  space  structure  by  analysis  of  the  correlation  between 
the  future  (F)  and  the  past  (P)  was  advocated  by  Akaike  (1976).  He  chose  to  use  the 
tool  of  canonical  correlation  analysis.  This  recognises  that  (in  the  case  p  *  q)  the 
linear  combination  of  future  values: 

Vj  ■  Vn+j-1  •  •"  -  Vn+)-p  *  f°r  j  *  P 

is  uncorrelated  with  any  set  of  variables  in  the  past.  The  procedure  is  to  choose  a 
sufficiently  large  but  fixed  number  M  of  values  in  the  past,  and  an  increasing  number 
K  «  1,2...  values  in  the  future  until  a  zero  canonical  correlation  is  found.  At  this 
point  K  ■  p  +  1,  and  the  coefficients  in  the  canonical  factor,  after  scaling,  supply  the 
values  of  ^  ...  All  that  is  required  is  an  estimate  of  the  covariance  matrix  T 

partitioned  as  in  (2.4).  This  is  then  submitted  to  a  standard  routine.  It  is  shown  in 
Anderson  (1958)  p.  296  that  the  analysis  seeks  for  eigenvalues  X^  of  a  reduced  matrix 
system  which  in  our  notation  is 

(G  -  X2r  )v  -  0  .  (4.1) 

rr 

Because  the  hypothesis  is  that  the  smallest  eigenvalue  is  zero,  we  are  again  in  a 
situation  of  detecting  a  singularity  of  G. 


5.  REGRESSION  USING  ESTIMATED  INNOVATIONS 

The  ARMA  model  (1.1)  appears  like  a  regression  equation  except  that  the  lagged 
variables  ^  on  the  RHS  are  unobserved.  Durbin  (1960)  suggested  that  estimates  of 
these  terms  may  be  obtained  by  first  fitting  a  suitably  high  order  regression  to  x^,  i.e. 

«t  “  "1*1-1  ♦  •••  ♦  "M*t-M  +  °t  <5*1> 

In  theory,  and  in  the  limit  M  ♦  •,  these  coefficients  are  the  «  weights  in  the 
expansion  of  *(B)  -  4(B) /9(B),  and  the  residuals  are  the  model  innovations  afc.  In 

practice,  sample  correlations  are  used  with  the  Levinson-Ourbln  algorithm  to  estimate  the 


coefficients,  which  should  then  be  referred  to  as  *  ,  ...  w  ,  and  the  residuals  a 

N|  1  Mf  H  t 

should  be  distinguished  from  . 

A  regression  is  then  carried  out  following  (1.1),  using  both  past  values  of  xt  and 


xt  ■  ♦iVi +  ••• +  Vt-p  “  eiVi 


-9a  +  e. 

q  t-q  t 


(S.2) 

where  efc  is  the  error  in  this  regression.  As  a  method  of  parameter  estimation  this  is 
very  useful,  giving  in  theory  consistent,  though  not  fully  efficient  estimates.  As  a 
method  of  selecting  the  orders,  the  regression  can  be  carried  out  for  increasing  p  and 
q  until  no  further  appreciable  reduction  in  residual  sum  of  squares  is  obtained.  HR  show 
how  this  can  be  done  in  a  consistent  manner  by  minimising  an  appropriate  criterion. 

Tsay  and  Tiao  (1983a)  use  a  similar  regression  to  (5.2)  using  residuals  from  a 
regression  such  as  (5.1),  but  the  residuals  they  use  are  from  equations  of  differing 
orders.  They  use  in  effect  the  minimum  number  M  of  past  values  in  (5.1),  to  ensure 
consistency  of  the  estimates  of  ♦  •  The  estimates  of  8  in  (5.2)  are  in  any  case  only 
consistent  in  the  limit  M  -»  Tsay  and  Tiao  are  not  primarily  concerned  with  estimation 
of  8  and  do  not  even  refer  to  the  coefficients  of  the  residuals  as  moving  average 
parameters.  They  do  however  use  ordinary  least  squares  in  all  their  regressions,  which 
allows  the  method  to  be  used  in  the  case  of  non-stationary  ARMA  processes. 

Also  recently.  Young,  Jakeman  and  NcMur tries  (1980)  have  reported  methods  of  order 
identification  in  transfer  function  modelling  which  use  the  trace  of  the  covariance  matrix 
of  the  parameter  estimates  as  a  criterion.  When  this  trace  becomes  large,  the  model  is 
taken  to  be  over  parameter! sed,  i.e.  the  orders  are  too  large.  Young  has  reported  to  us 
the  results  of  applying  this  method  to  ARMA  modelling  by  fitting  a  transfer  function 


between  and  xt,  i.e. 


♦  (B) 

®t  ■  8(B)  xt  ut  ' 


(5.3) 


where  are  the  residuals  already  defined  by  a  regression  such  as  (5.1).  An 

introduction  to  the  instrumental  variable  estimation  methods  used,  is  given  by  Young 
(1974).  In  essence,  they  are  in  this  context  equivalent  to  estimating  the  parameters  in 


the  linear  regression 


(5.4) 


x*  -  ■  *.*•  -  8  n*  -  ...  -  8  a*  +  f 

t  t  1  t-1  p  t-p  1  t- l  q  t-q  t 

where  f,.  is  the  error,  end  x*,  a*  ere  filtered  versions  of  x  ,  a  ,  i.e. 

t  t  t  w  t 

xj  -  L(B)xt,  a*  -  L(B)Ot  . 

Fro*  (5.3)  the  ideal  choice  for  L(B)  is  1/6(B)  where  the  true  (but  unknown)  value 

of  0(B)  should  be  used.  The  first  stage  is  to  set  L(B)  “  1,  with  the  intention  of 

using  the  estiaates  of  0(B)  froa  this  stage,  to  define  L(B)  in  a  further  stage.  For 

the  present  we  take  L(B)  “  1,  giving  xj  “  xfc  and  a*  “  »t,  so  that  (5.4)  differs  froa 

(5.2)  only  in  the  presence  of  on  the  LHS. 

We  now  deal  with  this  point,  and  relate  to  previous  sections,  by  noting  that  is 

by  construction  orthogonal  to  xfc_^  ...  *t_p»  and  provided  M  is  sufficiently  large,  also 

to  a  ....  a  .  Its  presence  in  (5.4)  does  not  therefore  affect  the  estimates  of  ♦ 
t- 1  t-q 

and  0  -  it  aay  even  be  included  in  the  regression  on  the  RHS.  The  estimated  residual 

2  2.2 

variance  0^  is  however  Mailer  than  o^  by  the  amount  o^i 

2  2  2 
0-0  -  0  . 
f  e  a 

2  2  2 

Thus  as  p  and  q  are  increased  to  obtain  a  better  fit,  of  ♦  0  as  ae  *  aa'  rt 
of  interest  also  to  compare  hb  who  use  a  bivariate  autoregression  on  to  obtain 

the  parameters  in  (5.2).  They  monitor  the  determinant  of  the  bivariate  error  covariance 
matrix  which  may  be  seen  to  be  approximately 


2,  2  2,  2  2 

c  (0  -  o  )  -  a  a 

a  e  a  a  f 


They  also  note  that  this  may  approach  zero. 

Considering  now  the  case  p  ■  q  in  (5.2),  we  point  out  that  because  ...  °^._p  are 
orthogonal,  it  is  a  simple  matter  to  orthogonalise  xfc  ...  xfc_p  with  respect  to  these 
variables.  In  theory,  in  the  limit  as  M  ♦  *  afc.  The  result  is  a  corrected  set  of 

variables  given  by  the  regression  coefficients  ♦ ^  of  xfc  on  past  values  at-j! 
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xt  “  xt  '  <at  +  Vt-1  + 
Vi  *  Xt-1  ■  <at-i  •  *1°t-2  + 


x„  ”  x, 
t-p  t-p 


+  ♦  a.  ) 
p  t-p 


+  i|i  ,a  ) 

p-1  t-p 


t-p 


(5.6) 


These  corrected  variables  are  simply  for  j  ”  p  +  1  ...  1.  Substituting 
from  (5.6)  into  the  regression  (5.2)  then  results  in  a  partial  regression  equation  in  the 
forecast  function  only: 


6  x  + 
9i  t-i 


+  * 

p  t-p 


♦  et 


The  coefficients  4^  nay  be  obtained  by  inversion  of  11(B),  i.e. 


'i  Vj-i  *  "*  ■  '  "j 

and  the  parameter  values  6  in  (5.2)  recovered  from  the  substitution  by 


♦  it 


(5.7) 


(5.8) 


6 


i 


-<* 


J 


-  ♦j-i»i  ‘♦j) 


or 


(5.9) 


Our  main  point  is  that  the  regression  (5.7)  depends  only  upon  the  covariance  matrix 

G  of  x  ...  x  and  therefore  relates  to  the  procedures  in  the  foregoing  sections, 

t  t-p 

If  p  is  increased  so  that  G,  with  K  «  p  +  1  becomes  effectively  singular  (according 
to  some  statistical  criterion)  then  p  is  the  order  selected.  In  the  procedure  of  HR  this 

is  equivalent  to  a  zero  determinant  in  (5.5).  In  the  procedure  of  Young  the  trace 

criterion  is  proportional  to  the  trace  of  the  inverse  of  the  covariance  matrix  of  the 
regressions,  i.e. 


/rFF  *  V1 

U\  * ) 
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The  determinant  of  thie  matrix  is  the  eame  as  det  G,  and  the  trace  of  the  inverse 
will  be  dominated  by  the  reciprocal  of  the  smallest  eigenvalue  of  G.  Near  singularity 


of  G  is  therefore  associated  with  large  trace  values. 

With  the  matrix  G  figuring  so  prominently  in  the  methode  we  have  reviewed,  we  now 
consider  how  best  to  estimate  it  from  a  data  sample,  and  what  statistical  analysis  to  apply 
in  its  investigation. 


6.  ESTIMATION  OF  THE  PREDICTION  COVARIANCE  MATRIX  G 

Given  a  finite  sample  x^  ...  x^  it  is  natural  to  use  the  sample  autocorrelations 
2  2 

r,  «  c,  /s  where  s  »  c.  and  for  k  >  0 
k  k  x  x  0 


-1 


N-k 

1 

1 


<xt  -  x)(xt+k 


-  x) 


(6.1) 


Substituting  iy  for  in  T  as  defined  and  used  in  (2.4)~(2.6)  then  eneuree  that 

T  is  positive  definite.  So  also  then  is  G  as  defined  in  (2.7).  A  selection 
(considered  later)  of  the  length  M  of  past  values,  and  K  of  future  values  is  required 
when  (2.7)  is  used.  If  M  is  large  the  matrix  inversion  in  (2.7)  can  be  avoided  by  using 
the  algorithm  in  IXirbin  (1960)  for  k  ■  1  ...  L,  where  l  ■  I  ♦  M  -  1.  This  starts  with 


Tg  »  1  and  proceeds  by  the  equations 


k  ,k 

"k.j 

T. 


<rk  "  wk-1,1rk-1 


”  Wk-1,k-1r1>/Tk-1 


*k-1,j  "k,k"k-1,k-J  '  ^ 


1  ...  k  -  1 


Vi<’ 


Vk* 


(6.2) 


Setting 


(6.3) 
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I 


I 


and 


D 


0 


(6.4) 


tha  value  of  6  dafinad  by  (2.7)  la  praclaaly 

e  -  rrp  -  o*  SOS'  ,  (6.5) 

where  6  ■  0-1  la  economically  obtalnad  bacauaa  0  la  triangular. 

Wa  shall  call  this  tha  dlract  aatlaata  of  G.  Cybanko  (1980)  providaa  assuranca  that 
tha  abova  procadura  la  numerically  wall  conditlonad. 


An  alternative  la  to  atop  tha  racuralons  In  (6.2)  at  nu  valua  of  L  ■  M  aay ,  and 

(dropping  tha  first  suffix  L)  to  vlaw  tha  resulting  valuaa  ...  t  aa  estimates  of  tha 

trua  w-vaights  of  tha  AIWA  nodal,  than  using  (5.8)  suppliaa  satins taa  of  ...  fron 

2 

which  Tp  and  G  ara  constructed  aa  in  (2.9)  and  (2.12).  Tha  estlixate  of  is 

o2  ”  c.t  -  no  bias  correction  being  nada.  Tha  siathod  of  construction  ensures  that  the 

<s  u  w 

-1  2 

operator  s(B)  *  0(8)  ■  1  ♦  ^1  ♦  OjB  ♦  ...  converges,  and  that  both  (2.10)  and  (2.11) 

hold.  This  alternative  differs  vary  little  from  (6.5),  being  sinply  a  natter  of  replacing 

*  by  t  and  T  by  t  throughout  (6.3)  and  (6.4).  Besides  reducing 
l.J  XlJ  l  " 

confutation,  wa  shall  discover  other  advantages  of  this  astlnate  of  G  which  wa  shall  call 
tha  AR(H)  satinets. 

He  have  also  Investigated  the  use  of  spectral  estlnates  of  G.  Assuning  that  tha 
spactrun  of  the  data  is  estinatad  using  a  classical  lag  window  wfc  applied  to  tha  sample 
acf,  with  a  truncation  lag  M' ,  estlnates  of  ths  coefficisnts  and  0^  nay  be 
obtained  by  spectral  factorization,  and  their  properties  havs  been  studied  by  Bhaneall 
(1974,  1977).  it  is  possible  to  implement  this  by  tha  Cranir-Wold  factorisation.  Given 
tha  windowed  acf  ,  k  «  1  ...  M',  this  supplies  tha  coefficients  ...  in  a 


-11- 


high  order  MA(M' )  model.  Then  6^  »  -8^.  The  factorization  can  be  done  rapidly  and 

accurately  using  algorithms  such  as  that  given  by  laurie  (1962).  The  matrix  G  is  again 

constructed  using  (2.12),  the  elements  of  being  the  windowed  autocovariances,  and 

2 

0a  the  innovation  variance  from  the  MA(M’)  model.  He  shall  call  this  the  HA(H* ) 
estimate  of  G. 

The  foregoing  estimates  of  G  all  suffer  from  the  limitations  on  the  accuracy  of 
as  an  estimate  of  when  the  series  borders  on  being  non-stationary  in  the  sense  of 

ARIMA  models.  Tsay  and  Tiao  have  paid  particular  attention  to  this  situation  and  recoamiend 
that  the  construction  of  G  be  based  on  the  true  least  squares  regression  equations  of  the 
future  upon  the  past.  Thus  in  our  notation  their  estimate  of  T  would  be  given  by 

-1  1+N 

r  -  <N  -  L)  l  xx  >  i,j  -  1  ...  L 
13  t«1+L  3 

where  here  L  -  K  +  m.  He  call  this  the  regression  estimate  of  G. 

Xn  the  following  proposals  for  how  G  should  be  analysed,  we  shall  assume  that  the 
AR(M)  or  HA(M')  estimate  is  used. 


7.  ANALYSIS  OF  THE  PREDICTION  COVARIANCE  MATRIX 


It  is  convenient  now  to  reverse  the  order  of  the  rows  and  columns  of  G  to  correspond 

with  the  sequence  of  variables  x(k),  k  «  1,2  ...  K,  whose  covariance  matrix  it 

n 

estimates.  This  also  allows  a  kind  of  open-ended  approach,  so  that  G  can  be  analysed  for 


increasing  order  K.  The  rearrangement  corresponds  to  using,  in  place  of  (2.12), 

G  ‘  rFF  -  Vr °.2  * 

we  shall  write  G^  for  the  upper  left  (p  +  1)  *  (p  +  1)  submatrix  of  G.  The 
procedure  we  suggest  is  to  factorize 


G  -  LDL'  (7.1) 

where  L  is  lower  triangular  with  unit  diagonal,  and  D  is  diagonal.  In  practice  we  have 
used  a  Choleski  factorization  from  which  L  may  be  obtained  by  simple  scaling  of  the 
columns,  and  D  by  squaring  the  diagonal.  It  is  valuable  to  use  a  method  which  stops 
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(without  catastrophic  failure)  when  a  diagonal  element  of  D  is  found  to  be  effectively 
zero  (using  numerical  rather  than  statistical  criteria).  For  recent  use  of  Choleaki 
factorization  in  similar  contexts  see  Newton  and  Pagano  (1983)  and  Hawkins  and  Eplett 
(1982). 

The  motivation  is  that  the  diagonal  elements  dg.d^  ...  of  D  constitute  the 

2  2  2 

sequence  of  regression  error  variances  ”  °e  “  corresponding  to  the  regression 

(5.2)  for  increasing  p  »  q  *  0,1,2  ...  .  Furthermore  L  1  “  T  say,  is  again  lower 
triangular  and  its  successive  rows  contain  the  estimates  of  the  AR  parameters  in  (5.2), 
i.e. 


<-♦  .  .  ...  U 

p  p-i  i 


(7.2) 


for  p  «  0,1,2  ...  .  The  row  t_  minimises  the  quadratic  form  t  G  t‘  w.r.t.  variations 

p  P  P  P 

in  1 ^  -  $  and  yields  d^  as  the  minimum.  That  d^  is  a  decreasing  sequence  is  seen 

from  the  fact  that  the  special  structure  of  G  implies 


t  ,G  t  *  <  t  G  t*  -  d 

p+1  p+1  p+1  p  p  p  p 

where  t^+1  *  (0,t^).  The  minimising  vector  tp+,  must  then  obviously  yield  d^+1  <  d^. 

Also  note  that  the  matrix  H  «  T T  is  lower  triangular  with  rows  consisting  of  the 

estimates  of  the  MA  parameters  in  (5.2), 

h 


(“V  'Vi  —  “V ij  • 


(7.3) 


Another  interpretation  of  the  procedure  may  be  deduced  from  (2.11).  This  is  that 
♦1  ...  4  result  from  the  regression  of  the  first  column  of  7^  as  defined  in  (2.9)  upon 
the  remaining  columns,  taking  K  *  p  ♦  1.  If  we  were  to  calculate 


H(B)  -  ♦(b)*(B)  -  l  h  Bk 
k-0 


r  2 

this  would  correspond  to  choosing  ♦(B)  so  that  l  h  was  minimised  (giving  d  ),  with 

_  p+i 

9(b)  given  by  truncation  of  H(B)  at  tr  .  It  is  clear  therefore  that  a  value  of 
dp  «  0  indicates  not  only  a  singularity  in  the  estimated  matrix  Gp,  but  also  an  exact 
representation  of  7(B)  ”  1/*(B)  ■  S(B)/7(B). 
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Th«  calculation*  needed  to  conatruct  G  ara  ralativaly  aodast ,  and  for  moderate 


values  of  p  thara  **—  littla  naad  to  exploit  tha  racuraion  advocated  by  HR.  However , 
it  ia  wall  worth  examining  what  they  propoaa.  Our  calculation*  are  exactly  equivalent  to 
a pacifying  tha  joint  covariance  atructura  of  tha  bivariate  proceaa  ( xfc ,  )  aa  followa 


cxx(k) 


c  (k) 

aa 


c 

ax 


(k) 


ck 

•  k  - 0 

0  .  k  *  0 

♦kV  k  >  0 

0  ,  k  <  0 


(7.4) 


Owing  these,  wa  aaa  that  tha  bivariate  AR(p)  fit  of  the  regreaaion  (S.2)  in  the  caaa 
p  “  q,  may  be  written 


(7.5) 


The  inaertion  of  exact  xeroa  in  part*  of  (7.4)  lead*  to  the  exact  0  and  1  in 
(7.5).  which  may  help  to  reduce  computation  if  the  recuraion  of  Whittle  (1963)  ia  uaed. 

A  uaeful  reault  derive*  from  the  fact  that  the  matrix  operator  in  (7.5)  will  aatiafy  a 
bivariate  atationarity  condition.  Becauae  of  ita  simple  structure,  this  implies  that 
4(B),  aa  specified  by  any  of  the  rowa  (7.2),  satisfies  the  univariate  atationarity 
condition.  There  is  no  similar  invertibility  constraint  upon  6(B)  except  if  dp  •  0, 
whence  it  follows  since  0(B)  «  4(b)6(B)  exactly.  If  a  value  of  p  is  chosen  with  dp 
small  then  6(B)  is  close  to  4(B)4(B),  so  that  0(B)  is  likely  to  be  invertible. 
Otherwise,  applying  one  of  the  many  devices  for  mapping  6(B)  into  an  invertible  form  can 
be  expected  to  result  in  only  minor  modification  of  its  parameters.  In  practice  we  have 
always  found  the  selected  values  of  the  ARMA  parameters  to  satisfy  atationarity  and 
invertibility  conatraints.  This  is  a  major  advantage  when  they  are  used  as  initial  values 
in  maximum  likelihood  estimation. 


8.  STATISTICAL  FROCBDORBS 


We  reco— e nd  th«  following  procedural,  The  justification  of  tha  statistical 
properties  which  ws  dels  is  not  included  here,  becuase  we  only  haws  proofs  In  outline 
fora*  They  are  however  in  accord  with  the  usual  ideas  of  regression. 

The  selection  of  the  order  M  used  in  the  AR(K)  estimate  of  G  aust  be  considered, 
but  it  does  not  appear  to  be  very  critical.  It  is  iaportant  that  it  be  sufficiently  large 
that  the  autoregressive  estimates  of  «k  have  relatively  saall  bias,  but  not  so  large  that 

m 

the  sequence  *k  is  unnecessarily  extended  beyond  the  point  where  is  effectively 

sero.  If  the  AIC  or  siailar  criterion  is  used  we  suggest  that  the  value  so  chosen  for  M 
be  doubled.  The  choice  of  X  should  be  at  aost  M/2.  If  the  MA(M'  )  estimate  of  G  is 
used,  the  equivalent  choice  of  M1  is  such  that 


e.g.  for  the  Parsen  lag  window,  M1  ■  3.71m. 

We  suggest  that  various  statistics  be  plotted.  Because  their  range  of  variation  Is 
large  in  some  cases,  we  have  had  to  truncate  some  of  these  on  the  vertical  scale  in  the 
following  figures,  in  order  to  reveal  the  detail  of  the  smaller  values. 

We  first  plot  the  diagonal  eleawnts  of  G  because  these  estimate  the  prediction 

m 

variances  v  -  Var(x  (k>)  for  increasing  lead  time  k  ■  1,2,...  .  If  these  die  rapidly 
k  n 

to  sero  there  may  be  rather  little  information  upon  which  model  selection  can  be  based, 

unless  the  series  length  n  is  large. 

Next,  plot  the  sequence  d0,d1  ...  which  estimate  the  residual  variance 
2  2  2 

■  «#  *  cq  in  the  regression.  These  will  usually  become  effectively  sero  at  index 
k  -  M/2  provided  M  is  moderately  large,  because  then  we  should  expect  4(B) /8(B)  with 
2p  •  M  parameters  to  provide  a  very  close  match  with  «(B)  which  has  order  M.  On 
occasions,  dfc  has  become  sero  at  a  lower  index,  and  through  bad  numerical  conditioning 
this  may  lead  to  increasing  values,  Hie  sequence  mist  be  truncated  at  this  point.  This  is 
no  great  problem  since  it  merely  means  that  an  ARMA  (p,p)  model  has  been  found  which  is 
virtually  coincident  with  the  AR(M)  model.  In  practice  we  scale  the  sequence  d^  by 
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<  r 


2 

taking  in  ita  place,  Nd^/o^# 

to  v 


and  we  shall  henceforth  mean  this  scaled  value  when  we  refer 


He  also  plot  the  sequence  of  reductions  in  the  residual  variance,  8^  »  dy  ^  -  dy, 
and  a  sequence  of  (bivariate)  partial  correlations  Ry  defined  from  the  regression  by 

«k2  ■  V<Vi *  H)  • 

Both  8y  and  Ry  measure  the  improvement  in  the  regression  (5.2)  on  introducing  the 
pair  of  variables  xfc_y ,  . 

Assuming  a  true  ARMA  (pg'Pg)  model  for  xt,  the  approximate  properties  for  large 
N  upon  which  we  base  order  selection,  are  that  for  k  >  pQ,  the  8y  form  a  sequence  of 
tID  random  variables  having  chi-squared  distributions  with  2  d.f.,  i.e.  exponential  with 
expectation  2.  For  k  <  pQ,  the  expectation  of  8y  will  be  inflated. 

He  take  NR^  to  have  approximately  the  same  distribution  as  8y  for  k  >  pQ.  The 
plot  of  Ry  may  therefore  be  scanned  and  treated  in  a  very  similar  manner  to  that 
recoamended  by  BJ  for  examining  partial  autocorrelation  functions.  It  is  always  positive 
however,  with  median  of  1.18//N  and  upper  90%  point  2.15/n.  The  'cut-off  point' 
indicates  the  true  order  p0.  Alternatively,  the  upper  90%  point  of  4.6  may  be  used  for 


The  appearance  of  the  residual  variance  plot  dy  may  be  deduced  to  be  a  random  walk 
for  i  >  p„,  with  downward  steps  8^,  reaching  0  at  k  -  m/2.  It  may  be  possible  to 
base  tests  for  the  model  order  on  detecting  the  point  at  Which  this  plot  falls  within  the 
expected  range  of  such  a  random  walk. 


9.  APPLICATIONS 

He  did  not  carry  out  an  extensive  simulation  study.  He  did  however  simulate  several 
of  the  models  used  by  HR  and  use  one  of  them  for  illustration.  This  was  the  model 

(1  ♦  .64B  +  ,7B2)xt  •  (1  ♦  .8B)at 

2 

with  ■  1.  He  took  N  •  50  and  set  M  “  10.  Figure  1  shows  the  series  and  Figure  2 
the  four  plots  derived  in  the  analysis.  The  order  p  ■  2  is  clearly  indicated,  with  some 
suggestion  that  p  “  3  might  be  considered.  For  p  "  2  the  preliminary  estimates  gave  a 
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3 


■odal  in  good  agreement  with  that  aimulatadi 

(1  +  .62B  +  .80B2>Kt  -(It  .6 IB  -  .1lB2)at  . 

w*  next  conaidarad  a  aariaa  which  wa  had  pravioualy  aodallad,  and  wara  intaraatad  to 

diecover  to  what  axtant  tha  nathod  yialdad  agreement  with  our  pravloua  concluaiona.  Tha 

aariaa  waa  tha  monthly  wholaaala  price  index  for  Italy  fraa  1970  to  1976.  The  difference 

of  tha  logarithm  wara  uaed,  and  ara  ahown  in  Figure  3.  Tha  previoua  analyaia  appeared  in 

Piccolo  and  Tunniclif fa  Wilaon  (1981).  There  ara  83  valuaa  and  wa  choee  M  -  20.  Figure  4 

ahowa  tha  reaulting  plota.  Ihe  order  p  “  2  ia  indicated,  with  6j  -  6.754,  exceeding 

2 

the  upper  95*  point  of  Xj«  Tha  value  of  •  3.97  ia  not  vary  large,  but  etanda  clear 

of  tha  regaining  part  of  thia  plot.  n>e  aariaa  ahowa  eoam  evidence  of  outlier  behaviour 
which  tenda  to  deflate  aaapla  autocorrelation*  and  aiallar  atatietica,  ao  wa  might  be 
perauaded  to  conaidar  p  -  3  aa  a  aerioua  poaaibility.  Tha  preliminary  aatlmataa  for  thia 
choice  gave  the  model 

(1  -  1.63B  +  1.24B2  -  .45B3)xt  -  (1  -  .598  ♦  .43B2  -  .04B3)at  . 

Thia  agreed  vary  well  with  our  pravioua  modelling  experience  which  after  aome  trial 
and  error,  had  lead  ua  to  an  MMA  (3,2)  model  with  very  aimilar  parametera.  The  apectrum 
of  thia  model  had  a  moderate  peak  eorreepondlng  to  a  period  of  6.6  month* . 

Finally  we  conaidered  the  aunapot  aariaa, and  for  model  aelection  took  tha  firat  230  of 

the  aet  of  256  valuea  from  1700  to  1955  due  to  WalAaeier  (1961).  We  applied  a  aquare-root 

tranaformation  giving  the  aariaa  ahown  in  Figure  5.  We  tried  analyaea  with  both  N  -  30 

and  M  -  40.  Figure  6  ahowa  the  reaulting  plota.  In  thia  example  waa  aingular  for 

p  -  7  which  ia  atrongly  indicated  aa  the  order  to  be  choaen,  aince  •  30.9  far  exceeda 

2 

the  99*  aignificance  level  of  x2*  Preliminary  aatimatea  gave  the  model 

(1  -  1.73B  ♦  1.05b2  -  . 02B3  -  .02B4  -  .4»5  ♦  .72B6  -  .4lB?)xt 
-  (1  -  . 59B  ♦  .02B2  ♦  . 04B3  ♦  .  33B4  -  .50B5  ♦  .25B6  ♦  .0lB7)*t 
Thia  la  a  atationary  and  invertible  model  which  waa  uaed  aa  the  etarting  point  in 
exact  likelihood  eatimation  ualng  the  GBN8TAT  package.  Convergence  effectively  occurred 
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SIMULATED  SERIES 


Figure  1. 


PREDICTION  VARIANCE 


RESIDUAL  VARIANCE 
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WHOLESALE  PRICES.  1970-76 


Figure  3. 


Figure  4.  Plots  for  ths  wholesale  prices  series. 
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Figure  5. 


after  IS  Iteration#  to  the  model i 

(1  -  2.00B  -  1.39B2  ♦  .  12B2  -  .25B4  -  .92B5  ♦  1.41B6  -  .66B?>xt 
-  (1  -  .BOB  *  .OOB2  *  .35B3  ♦  . 52B4  -  .99B5  ♦  .28B6  +  ,2SB7)at 
examination  of  the  parameter  standard  errora  Indicated  poseible  truncation  to  ARMA  (7.5). 
The  eatimated  reaidual  variance  was  0.999,  and  the  residual  autocorrelation  pattern  was 
extremely  white  in  appearance.  The  spectrum  of  the  fitted  model  is  shown  in  Figure  7.  the 
square  root  of  the  spectrum  is  plotted,  so  as  to  reveal  the  smaller  peaks,  one  of  which 
appears  to  be  a  harmonic  of  the  very  sharp  main  peak. 


Figure  7. 


There  was  some  concern  that  the  use  of  sample  autocorrelations  might  be  inadequate  in 
the  analysis  of  such  a  highly  autocor related  series.  The  investigation  was  therefore 
repeated,  with  the  AR{40)  model  for  the  *  coefficients  being  fitted  by  exact  likelihood, 
and  corresponding  theoretical  acf  values  being  used  in  the  construction  of  6. 

The  resulting  plots  were  however  very  similar. 
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Furthermore,  this  data  set  was  used  to  investigate  the  spectral  factorization  or 
MA(M' )  approach  to  estimating  G,  as  described  in  section  6.  The  sample  acf  was  weighted 
using  a  Parzen  window  with  a  cut-off  at  lag  H'  =  110,  corresponding  to  a  value  of 
M  “  30.  Again  the  results  were  extremely  close  to  those  previously  obtained.  The  rather 
unexpected  result  of  the  singularity  of  G7  when  very  large  values  of  M  were  used, 
suggests  that  perhaps  the  series  is  somewhat  more  deterministic  than  we  had  supposed. 

Based  on  these  and  other  examples  our  conclusion  is  that  the  method  succeeds  in 
leading  us  to  ARMA  models  which  well  approximate  the  structure  of  series. 


10.  AN  INVERSE  PROCEDURE 


In  section  7  we  showed  that  the  method  could  be  viewed  as  estimating  the  AR  parameters 
in  the  regression 


♦j  =  *1*3-1  +  —  +  Vj-P  £°r  j  >  p 

using  the  estimates  of  4^.  This  prompts  us  to  consider  the  inverse  problem  of  estimating 
the  MA  parameters  in  the  regression 

wj  “  Vj-i  +  •••  +  Vj-P  for  j ' p ' 

using  the  estimates  of  This  of  course  ties  in  very  closely  with  the  proposal  of 

Durbin  (1959)  for  estimating  parameters  in  the  MA(q)  model,  though  his  method  starts  the 
regression  from  j  «  1.  The  possibility  of  starting  the  regression  from  j  ■  p  is,  we 
believe,  suggested  by  Fuller  (1976)  p.  359  in  the  case  of  an  ARMA(p,q)  model. 

To  implement  this  idea  we  merely  treat  the  w  weights  from  the  AR(M)  model  as  the  ♦ 
weights  of  the  Inverse  model 
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The  corresponding  inverse  autocovariances  are  simple  estimated  as 

2  M-k 

ci.  =0  y 

k  a  ,L  i  i+k 
i*0 

and  the  G  matrix  of  the  inverse  model  derived.  It  may  however  be  directly  calculated 
(with  rows  and  columns  in  the  order  used  for  Choleski  factorization)  as 
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for  1  <  k  <  j  . 


2  *? 

Gijk  ‘  *«  J0  VjVk 

He  applied  this  idaa  to  tha  aarlaa  of  annual  variations  in  day  length  from  1820  to 
1970.  we  used  the  firet  differences  of  the  data  which  are  shown  in  rigure  8,  and  chose 
M  -  40.  the  plots  shown  in  Figure  9  clearly  indicate  p  ”  2,  and  provide  the  prells inary 
model  for  the  original  series  (after  interchanging  the  estimates  of  0  and  4)  of 
(1  -  .318  -  .05B2)7xt  -  (1  ♦  .558  +  .55B2)at  . 

Note  now  that  it  is  the  NA  operator  6(8)  which  hae  its  invertibility  ensured,  this 
■odel  is  very  close  to  that  previously  selected  by  conventional  ARMA  modelling: 

(1  -  .39B)V«t  «  (1  ♦  .55B  ♦  .62B2)at  . 

We  should  mention  that  the  method  when  applied  directly  to  this  series,  as  for  the 
examples  in  section  9,  had  indicated  a  different  result,  that  p  -  3,  with  preliminary 
model: 


(1  -  .958  -  . 06B2  ♦  .2B3)?Xt  -  (1  -  .10B  -  .00B2  -  .40B3)at  . 

After  estimation  of  this  MIA  (3,3)  model,  it  wee  found  to  be  similar  to  the  AIMA 
(2,2)  model,  but  with  a  nearly-cancelling  extra  linear  factor  on  each  side. 

In  this  case,  we  know  that  Vxfc  has  a  spectrum  which  has  a  near  zero  around  frequency 
2v/3,  and  very  little  power  above  that,  due  to  prior  data  maoothing.  This  leads  to  a 
strong  HA  operator  in  the  model,  which  the  inverse  method  seems  better  at  detecting. 


11.  THE  CASE  OF  UNEQOAX.  ORDERS . 

The  foregoing  section  prompts  consideration  of  this  case.  We  propose  that  our  method 
■ay  be  adapted  for  p  *  q  as  follows.  In  the  case  that  p  -  q  ■  r  >  0,  the  equation 
(2.2)  for  the  forecast  function  involves  r  values  •••  xnei-r  in  the  set  P  of 

past  values.  The  whole  analysis  can  then  be  carried  out  by  allowing  the  future  F  and 
past  P  to  be  no  longer  a  simple  partition,  but  to  Include  the  above  variablee  as  an 
overlap.  This  is  handled  with  great  simplicity  by  prefixing  rows  and  columns  of  to 

G,  to  give  G+r  say.  For  example  if  r  -  1, 
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Figure  8. 


Figure  9.  Plots  for  the  daylength  series. 
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Further  identical  rows  and  columns  are  added  for  r  “  2,3  ...  .  Akaike  (1974)  recommends 
that  F  and  P  overlap  by  xn>  so  that  his  canonical  correlation  approach  is  oriented 
towards  ARMA  (p.p  -  1)  models. 

Proceeding  with  the  Choleski  factorisation  of  G+r  now  yields  dQ,d1  ...  dr_^  as 
estimates  of  the  prediction  error  in  AR(k)  models  for  k  ■  0  ...  r  -  1.  Me  suggest  that 
the  plots  previously  recommended  be  now  started  from  p  ■  r,  i.e.  for  q  “  0,1,2  ...  . 

In  the  case  q  -  p  “  r  >  0,  we  take  a  gap  of  length  r  between  the  future  and  the 
past,  and  this  is  implemented  merely  by  removing  the  first  r  rows  and  columns  of  G,  to 
give  G_r  say. 

By  this  means  the  diagonals  of  an  array  for  all  p  >  0  and  q  >  0  may  be  built  up 

out  of  the  statistics  d(p,q),  and  used  in  various  ways  to  select  p  and  q.  Me  are 

rather  reluctant  to  recommend  the  use  of  such  a  proliferation  of  statistics,  except 

possibly  in  the  application  of  an  automatic  method  such  as  that  in  HR.  It  is  also  of 

* 

Interest  to  note  that  for  any  given  p,q  the  estimates  4  may  be  viewed  as  being  obtained 
by  regressing  the  first  column  of  Vp  in  (2.9)  upon  the  remaining  columns,  where  the 
number  of  columns  K  »  p  +  1,  but  now  the  partition  is  obtained  by  removing  the  first 
q  +  1  rows  of  T .  For  q  »  0  the  estimates  of  the  AR(p)  model  are  simply  the  Yule- 
Halker  estimates  which  are  known  to  be  asymptotically  equivalent  to  the  HLCs. 

It  may  also  be  shown  that  for  any  p,q  the  parameters  can  be  calculated  using  the 
bivariate  autoregression  of  HR  with  the  entries  in  (7.4)  slightly  modified.  For 
p  -  q  *  r  >  0,  the  entry  4^  is  replaced  by  4k_r  which  is  taken  as  0  for  k  <  r.  For 

q  -  p  «  r  >  0,  4k  is  replaced  by  4^+r  and  cfc  replaced  byj 

ck  -  ( j0  Vi*kK  • 

This  reassures  us  that  for  all  p,q  the  estimate  of  4(9)  is  stationary. 
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Thera  la  of  coursa  a  corresponding  array  which  My  be  calculated  using  the  inverse 
autocovariances.  The  estiMtes  of  8(B)  associated  with  this  array  will  similarly  always 
be  invertible  for  any  p,q.  For  p  ■  0  they  are  the  Durbin  estimates  which  are  proved  by 
Bhansali  (1980)  to  be  asymptotically  efficient  provided  M  is  allowed  to  grow  as  a 
specified  function  of  N. 

12.  EFFICIENCY  FOR  MIXED  MODELS. 

The  calculation  of  fully  efficient  estimates  of  4,8  in  the  case  p  >  0  and  q  >  0 
is  considered  by  HR.  They  use  the  preliminary  estimates  in  a  manner  described  also  in 
Fuller  (1976),  to  set  up  a  regression  for  parameter  corrections.  After  applying  these 
corrections  to  the  prelisiinary  estimates,  corrected  estimates  are  obtained  which  have  the 
same  asymptotic  distribution  as  the  MLEs. 

It  would  be  attractive  if  a  direct  regression  were  available  *rtiich  supplied  good 
approxiMtions  to  the  MLEs.  This  seems  unattainable,  but  we  describe  a  procedure  which 

gives  some  encouragement.  We  claim  that  the  regression  (5.4)  provides  such  estimates, 

1  /2 

which  agree  with  the  MLE  to  within  op(n  ),  provided  the  filter  L(B)  is  appropriately 
chosen.  One  such  choice  is  to  use 

xj  «  1/8(B)xt,  a*  -  1/8(B)afc  (12.1) 

and  another  is 

x*  -  1/*<B)at,  a*  -  1/4(B)xit  (12.2) 

where  xifc  »  it (Bla^  is  the  estiMted  inverse  series,  and  x(B)  the  operator  estimated  in 
(5.1).  Thus  (12.1)  corresponds  to  L(B)  “  1/8{B)  and  (12.2)  to  L(B)  m  *(B)/4(B).  The 
first  choice,  in  the  case  of  q  “  0,  i.e.  8(b)  “  1  gives  the  Yule-Walker  estimates  of 
4,  the  second  choice  in  the  case  p  ■  0,  4(B)  “  1  gives  the  Durbin  estiMtes  of  8.  in 
any  other  case  some  approximation  to  8(B)  or  4(B)  must  be  used  in  order  to  get 
reasonable  efficiency.  One  cannot  always  rely  on  getting  good  preliminary  estiMtes  of 
these  by  using  the  Min  method  of  this  paper.  We  have  for  example  simulated  a  sample  of 
size  100  from  the  signal-plus-noise  model 

(1  -  .98B)xt  -  (1  -  .83B)at  . 


-26- 


The  method  co^lauly  fat  1*4  to  identify  the  nodal ,  although  the  aanpla  act  hint*  at 
th#  nodal  by  showing  a  tendency  to  positive  values  at  low  lags.  niara  la  Just  eufflclant 
Information  in  tba  data  to  obtain  noil  daflnad  Mia  9  -  .64(81  .26)  and  9  "  .76(81  .11), 
though  raaaonably  good  prallnlnary  aatlnataa  aro  nncoaaary  to  find  thaaa. 

ft  would  appear  to  ue  that  part  of  the  aklll  In  Identifying  MM  node  la  fra*  the 
sample  aef,  1 laa  In  visual  filtering  of  the  aef  in  order  to  pick  out  the  patterns.  This 
aaana  to  ha  equivalent  to  selecting  the  filters  in  (12.1)  or  (12.2),  to  amplify  the  signal 
component  of  the  data.  Often  the  signal  is  at  a  low  frequency  and  then  on*  nay  be  able  to 
obtain  tnich  improved  reeulta  fro*  the  regression  (6.4)  by  a  very  rough  guess  at  a  low 
frequency  selection  filter  to  use  in  place  of  1/9(8)  in  (12.2). 

We  hope  to  Investigate  automatic  heuristic  methods  of  selecting  such  filters,  e.g.  by 
scanning  the  aeries  spectrum  to  locate  the  frequency  bands  of  signal  components.  The 
orders  in  the  regression  (6.4)  nay  then  be  determined  by  criteria  such  as  88  use,  or  by 
other  methods  of  selecting  variables  In  regression.  Unfortunately  the  attractive 
etat loner  I ty  property  of  4(S)  Is  no  longer  certain,  but  In  general.  Improved  estimates 
wilt  result.  Nsf tneawnt  of  these  by  a  few  iterations  of  ML  estimation  will  still  be 
recuswisnded. 

On  a  final  point,  wa  mention  that  order  identification  and  preliminary  estimation 
methods  have  bean  successfully  applied  In  the  smltiveriate  tlam  aeries  contest.  Cooper  and 
Mood  (1482)  use  canonical  oross-cor relat ion  analysis,  and  Tiso  and  Taay  (1983)  have 
awtended  their  regression  methods.  Us  espect  that  the  method  proposed  in  this  paper  will 
readily  generalise,  and  It  nay  be  worthwhile  exploiting  the  multivariate  recursion  of 
Whittle  not  only  to  obtain  multivariate  Innovations,  but  also  In  an  extension  of  MS's  us* 
of  this  algorithm. 
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